28 research outputs found

    CCancer: a birdā€™s eye view on gene lists reported in cancer-related studies

    Get PDF
    CCancer is an automatically collected database of gene lists, which were reported mostly by experimental studies in various biological and clinical contexts. At the moment, the database covers 3369 gene lists extracted from 2644 papers published in āˆ¼80 peer-reviewed journals. As input, CCancer accepts a gene list. An enrichment analyses is implemented to generate, as output, a highly informative survey over recently published studies that report gene lists, which significantly intersect with the query gene list. A report on gene pairs from the input list which were frequently reported together by other biological studies is also provided. CCancer is freely available at http://mips.helmholtz-muenchen.de/proj/ccancer

    The BioPAX community standard for pathway data sharing

    Get PDF
    Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery. Ā© 2010 Nature America, Inc. All rights reserved

    PPI spider: A tool for the interpretation of proteomics data in the context of protein-protein interaction networks.

    No full text
    Recent advances in experimental technologies allow for the detection of a complete cell proteome. Proteins that are expressed at a particular cell state or in a particular compartment as well as proteins with differential expression between various cells states are commonly delivered by many proteomics studies. Once a list of proteins is derived, a major challenge is to interpret the identified set of proteins in the biological context. Protein-protein interaction (PPI) data represents abundant information that can be employed for this purpose. However, these data have not yet been fully exploited due to the absence of a methodological framework that can integrate this type of information. Here, we propose to infer a network model from an experimentally identified protein list based on the available information about the topology of the global PPI network. We propose to use a Monte Carlo simulation procedure to compute the statistical significance of the inferred models. The method has been implemented as a freely available web-based tool, PPI spider (http://mips.helmholtz-muenchen.de/proj/ppispider). To support the practical significance of PPI spider, we collected several hundreds of recently published experimental proteomics studies that reported lists of proteins in various biological contexts. We reanalyzed them using PPI spider and demonstrated that in most cases PPI spider could provide statistically significant hypotheses that are helpful for understanding of the protein list

    Beyond the 'best' match: Machine learning annotation of protein sequences by integration of different sources of information.

    No full text
    Accurate automatic assignment of protein functions remains a challenge for genome annotation. We have developed and compared the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure and several machine learning methods. RESULTS: The analyzed genomes were manually annotated with FunCat categories in MIPS providing a gold standard. Features describing a pair of sequences rather than each sequence alone were used. The descriptors were derived from sequence alignment scores, InterPro domains, synteny information, sequence length and calculated protein properties. Following training we scored all pairs from the validation sets, selected a pair with the highest predicted score and annotated the target protein with functional categories of the prototype protein. The data integration using machine-learning methods provided significantly higher annotation accuracy compared to the use of individual descriptors alone. The neural network approach showed the best performance. The descriptors derived from the InterPro domains and sequence similarity provided the highest contribution to the method performance. The predicted annotation scores allow differentiation of reliable versus non-reliable annotations. The developed approach was applied to annotate the protein sequences from 180 complete bacterial genomes. AVAILABILITY: The FUNcat Annotation Tool (FUNAT) is available on-line as Web Services at http://mips.gsf.de/proj/funat

    Book reviews

    No full text

    PPISURV: a novel bioinformatics tool for uncovering the hidden role of specific genes in cancer survival outcome

    No full text
    Multiple clinical studies have correlated gene expression with survival outcome in cancer on a genome-wide scale. However, in many cases, no obvious correlation between expression of well-known tumour-related genes (that is, p53, p73 and p21) and survival rates of patients has been observed. This can be mainly explained by the complex molecular mechanisms involved in cancer, which mask the clinical relevance of a gene with multiple functions if only gene expression status is considered. As we demonstrate here, in many such cases, the expression of the gene interaction partners (gene 'interactome') correlates significantly with cancer survival and is indicative of the role of that gene in cancer. On the basis of this principle, we have implemented a free online datamining tool (http://www.bioprofiling.de/PPISURV). PPISURV automatically correlates expression of an input gene interactome with survival rates on >40 publicly available clinical expression data sets covering various tumours involving about 8000 patients in total. To derive the query gene interactome, PPISURV employs several public databases including protein-protein interactions, regulatory and signalling pathways and protein post-translational modifications
    corecore